Discrimination-Net for Hindi

نویسندگان

  • Diptesh Kanojia
  • Arindam Chatterjee
  • Salil Joshi
  • Pushpak Bhattacharyya
چکیده

Current state-of-the-art Word Sense Disambiguation (WSD) algorithms are mostly supervised and use the P (Sense|Word) statistic for annotation. This P (Sense|Word) statistic is obtained after training the model on an annotated corpus. The performance of WSD algorithms do not match the efficiency and quality of human annotation. It is therefore important to know the role of the contextual clues in WSD. Human beings in turn, actuate the task of disambiguating the sense of a word, by gathering hints from the context words in the neighbourhood of the word. Contextual clues thus form the basic building block for the human sense disambiguation task. The need was thus felt for a tool, which could help us get a deeper insight into the human mind, while disambiguating polysemous words. As mentioned earlier, in the human mind, sense disambiguation highly depends on finding clues in corpus text, which finally lead to a winner sense. In order to make WSD algorithms more efficient, it is highly desirable to assimilate knowledge regarding contextual clues of words. In order to make WSD algorithms more efficient, it is highly desirable to assimilate knowledge regarding contextual clues of words, which aid in finding correct senses of words in that context. Hence, we developed a tool which could help a lexicographer mark the clues for disambiguating a word in a context. In the current phase, this tool lets the lexicographer select the clues from the gloss and example fields in the synset, and adds them to a database.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Expansion of the First Hindi-Nepali Word-Net Based Bi-Lingual Dictionary and the advancement of the Human-Machine Interface

Natural Language Processing is introducing a new era in the field of Computer Science and Machine translation. HumanMachine interaction is to play a very important role in the coming centuries as the dependency of human over the machine is increasing variably. Word-Net was first introduced by Miller and Fellbaum in 1985. WordNet is a Lexical database for the Human Languages. It groups the Human...

متن کامل

Discrimination of English to other Indian languages (Kannada and Hindi) for OCR system

India is a multilingual multi-script country. In every state of India there are two languages one is state local language and the other is English. For example in Andhra Pradesh, a state in India, the document may contain text words in English and Telugu script. For Optical Character Recognition (OCR) of such a bilingual document, it is necessary to identify the script before feeding the text w...

متن کامل

Orienting Attention While Training Hindi Segments

The current experiment tests for an effect of attention during phonetic learning by manipulating attentional allocation to different aspects of the phonetic signal during training. In an identification task, two native English speaking participant groups were trained on novel Hindi words containing unfamiliar consonants and vowels. Both groups were presented with the same auditory stimuli. One ...

متن کامل

An Investigation to Semi supervised approach for HINDI Word sense disambiguation

This paper investigates yarowsky algorithm for Hindi word sense disambiguation. The evaluation has been developed o n a manually created sense tagged corpus consisting of Hindi words (nouns). The sense definition has been obtained from Hindi Word Net, which is developed at I I T B o m b a y . The maximum observed prec is ion o f 61.7 on 605 tes t ins tances corresponds to the case when both ste...

متن کامل

Comparison of Bayesian and Frequentist Methods in Estimating the Net Reclassification and Integrated Discrimination Improvement Indices for Evaluation of Prediction Models: Tehran Lipid and Glucose Study

Introduction: The Frequency-based method is commonly used to estimate the Net Reclassification Improvement (NRI)- and Integrated Discrimination Improvement (IDI) indices. These indices measure the magnitude of the performance of statistical models when a new biomarker is added. This method has poor performance in some cases, especially in small samples. In this study, the performance of two Bay...

متن کامل

Discrimination of coronal stops by bilingual adults: the timing and nature of language interaction.

The current study was designed to investigate the timing and nature of interaction between the two languages of bilinguals. For this purpose, we compared discrimination of Canadian French and Canadian English coronal stops by simultaneous bilingual, monolingual and advanced early L2 learners of French and English. French /d/ is phonetically described as dental whereas English /d/ is described a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012